frequency word
Quality Estimation of Machine Translated Texts based on Direct Evidence from Training Data
Kumari, Vibhuti, Kavi, Narayana Murthy
Current Machine Translation systems achieve very good results on a growing variety of language pairs and data sets. However, it is now well known that they produce fluent translation outputs that often can contain important meaning errors. Quality Estimation task deals with the estimation of quality of translations produced by a Machine Translation system without depending on Reference Translations. A number of approaches have been suggested over the years. In this paper we show that the parallel corpus used as training data for training the MT system holds direct clues for estimating the quality of translations produced by the MT system. Our experiments show that this simple and direct method holds promise for quality estimation of translations produced by any purely data driven machine translation system.
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (2 more...)
Frequency effects in Linear Discriminative Learning
Heitmeier, Maria, Chuang, Yu-Ying, Axen, Seth D., Baayen, R. Harald
Word frequency is a strong predictor in most lexical processing tasks. Thus, any model of word recognition needs to account for how word frequency effects arise. The Discriminative Lexicon Model (DLM; Baayen et al., 2018a, 2019) models lexical processing with linear mappings between words' forms and their meanings. So far, the mappings can either be obtained incrementally via error-driven learning, a computationally expensive process able to capture frequency effects, or in an efficient, but frequency-agnostic closed-form solution modelling the theoretical endstate of learning (EL) where all words are learned optimally. In this study we show how an efficient, yet frequency-informed mapping between form and meaning can be obtained (Frequency-informed learning; FIL). We find that FIL well approximates an incremental solution while being computationally much cheaper. FIL shows a relatively low type- and high token-accuracy, demonstrating that the model is able to process most word tokens encountered by speakers in daily life correctly. We use FIL to model reaction times in the Dutch Lexicon Project (Keuleers et al., 2010) and find that FIL predicts well the S-shaped relationship between frequency and the mean of reaction times but underestimates the variance of reaction times for low frequency words. FIL is also better able to account for priming effects in an auditory lexical decision task in Mandarin Chinese (Lee, 2007), compared to EL. Finally, we used ordered data from CHILDES (Brown, 1973; Demuth et al., 2006) to compare mappings obtained with FIL and incremental learning. The mappings are highly correlated, but with FIL some nuances based on word ordering effects are lost. Our results show how frequency effects in a learning model can be simulated efficiently by means of a closed-form solution, and raise questions about how to best account for low-frequency words in cognitive models.
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Africa > Kenya > Mandera County > Mandera (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings
Valentini, Francisco, Rosati, Germán, Slezak, Diego Fernandez, Altszyler, Edgar
Numerous works use word embedding-based metrics to quantify societal biases and stereotypes in texts. Recent studies have found that word embeddings can capture semantic similarity but may be affected by word frequency. In this work we study the effect of frequency when measuring female vs. male gender bias with word embedding-based bias quantification methods. We find that Skip-gram with negative sampling and GloVe tend to detect male bias in high frequency words, while GloVe tends to return female bias in low frequency words. We show these behaviors still exist when words are randomly shuffled. This proves that the frequency-based effect observed in unshuffled corpora stems from properties of the metric rather than from word associations. The effect is spurious and problematic since bias metrics should depend exclusively on word co-occurrences and not individual word frequencies. Finally, we compare these results with the ones obtained with an alternative metric based on Pointwise Mutual Information. We find that this metric does not show a clear dependence on frequency, even though it is slightly skewed towards male bias across all frequencies.
- North America > United States (0.14)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.05)
- Europe > Portugal > Lisbon > Lisbon (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
Introduction to Automatic Text Summarization
Sifting through lots of documents can be difficult and time consuming. Without an abstract or summary, it can take minutes just to figure out what the heck someone is talking about in a paper or report. And, if you need to get through hundreds of documents – good luck. Summarizer is an algorithm that extracts sentences from a text document, determines which are most important, and returns them in a readable and structured way. Automatic text summarization is part of the field of natural language processing, which is how computers can analyze, understand, and derive meaning from human language.
Comparison Between Global Vs Local Normalization of Tweets, and Various Distances
From the text mining literature, it appears that practitioners tend to utilize Cosine Distance to compare 2 documents. They have used it with great success. From our previous blog, we also used Cosine Distance and we also found it extremely good and helping us, and our clustering method, get an insight in the UK Exit Referendum. In here, we decided to change our initial conditions and see if we get different outcomes,i.e. We decided to try 4 others distances: Jaccard, Matching, Rogers Tanimoto and Euclidean.
Comparison Between Global Vs Local Normalization of Tweets, and Various Distances
In the previous example we used clustering to see if an apparent pattern exists within Brexit tweets. We found out that we have three distinct patterns, the leave, the referendum, and Brexit. This in itself helps us think that we may even create a classifier that can identify if the tweet writer is pro or agains an issue automatically, with no human intervention. Let's get back to the issues related to clustering. To use the clustering algorithm we had to map 2 tweets at the time to a binary vector.
Human Reading and the Curse of Dimensionality
Whereas optical character recognition (OCR) systems learn to classify singlecharacters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images isreduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR)systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1) . OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).
- North America > United States > Kansas (0.06)
- North America > United States > Texas > Travis County > Austin (0.04)
Human Reading and the Curse of Dimensionality
Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1). OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).
- North America > United States > Kansas (0.06)
- North America > United States > Texas > Travis County > Austin (0.04)
Human Reading and the Curse of Dimensionality
Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1). OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).
- North America > United States > Kansas (0.06)
- North America > United States > Texas > Travis County > Austin (0.04)